Machine-printed and hand-written text lines identification

نویسندگان

  • Umapada Pal
  • Bidyut Baran Chaudhuri
چکیده

There are many types of documents where machine-printed and handwritten texts intermixedly appear. Since the optical character recognition (OCR) methodologies for machine-printed and handwritten texts are di€erent, to achieve optimal performance it is necessary to separate these two types of texts before feeding them to their respective OCR systems. In this paper, we present a machine-printed and handwritten text classi®cation scheme for Bangla and Devnagari, the two most popular Indian scripts. The scheme is based on the structural and statistical features of the machine-printed and handwritten text lines. The classi®cation scheme has an accuracy of 98.6%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Character Recognition using RCS with Neural Network

Hand written Tamil Character recognition refers to the process of conversion of handwritten Tamil character into Unicode Tamil character. The scanned image is segmented into paragraphs using spatial space detection technique, paragraphs into lines using vertical histogram, lines into words using horizontal histogram, and words into character image glyphs using horizontal histogram. The extracte...

متن کامل

Ocr-the 3 Layered Approach for Decision Making State and Identification of Telugu Hand Written and Printed Consonants and Conjunct Consonants by Using Advanced Fuzzy Logic Controller

Optical Character recognition is the method of digitalization of hand and type written or printed text into machine-encoded form and is superfluity of the various applications of envision of human’s life. In present human life OCR has been successfully using in finance, legal, banking, health care and home need appliances. India is a multi cultural, literature and traditional scripted country. ...

متن کامل

Devanagari Character Recognition towards Natural Human Computer Interaction

Human-computer interaction is a growing research area. There are several ways of interaction with the computer. Handwriting has continued to persist as a means of communication and recording information in the day to day life even with the introduction of new technologies. Due to the growth of technology in India, it becomes important to devise ways that allow people to communicate with compute...

متن کامل

Density Based Script Identification of a Multilingual Document Image

Automatic Pattern Recognition field has witnessed enormous growth in the past few decades. Being an essential element of Pattern Recognition, Document Image Analysis is the procedure of analyzing a document image with the intention of working out the contents so that they can be manipulated as per the requirements at various levels. It involves various procedures like document classification, o...

متن کامل

Probabilistic Random Field Based Method for Annotated Machine Printed Documents Preprocessing

Today, the convenience of search, both on the personal computer hard disk and on the web, is essentially limited to machine-printed text documents and images because of the poor accuracy of handwriting recognizers. The proposed research will advance the state-of-the-art in realizing search of hand-annotated documents. We will primarily target machine-printed documents which have been annotated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2001